enable flash_attn_with_kvcache #68

micmelesse · 2024-07-10T15:29:23Z

this is an imlementaiton of flash_attn_with_kvcache using a triton flash attention decode kernel. It adds the following to the decode kernel.

key masking on the kvcache
inplace kv cache updates
causal
alibi

future work will involve adding support for

paged attention
local sliding window attention
rotary embedding

This is a combination of 11 commits. kvcache work This is a combination of 4 commits. kvcache is not supported save save decode save clean up merge save cases save save save save key mask on triton side fix q size issue test combos save

micmelesse added 25 commits June 26, 2024 11:38

Compress kvcache work

7be0b44

This is a combination of 11 commits. kvcache work This is a combination of 4 commits. kvcache is not supported save save decode save clean up merge save cases save save save save key mask on triton side fix q size issue test combos save

fix causal. use cache_seqlens

356d243

clean and test what works

3e3dfc1

some configs work on new_kv but fails on 1,8

5a3cb0d

cache overwrite correct

e611433

new_kv works more or less

737b701

test local

52eb402

work on paged kv attention

619c9ad

prefill paged attention

2d49406

fix has_batch_idx and skip local and rotatary emb

e5d13ef

save

6aa7caf

save

d46e730

save

63eb390

save

f0193a7

handle new_kv when paged kv cache

0e5223c

all except has_batch_idx works

962dc8a

major options are green

b334464

test all

0f3091c

add tests

c5be670

save

10fd70b

clean up

4c10a6b

minor clean up

753093d

simplest config

431fd7a

save debug true

70fce1e

save

3d73e88

micmelesse changed the title ~~enable kvcache~~ enable flash_attn_with_kvcache Jul 10, 2024

micmelesse requested review from scxiao and vgokhale July 10, 2024 19:47

micmelesse added 2 commits July 11, 2024 14:55

refactor slightly

8a393f6

save work

f681687

micmelesse added 8 commits July 30, 2024 17:04

cache index

5edf575

deal with nans on fwd_splitk

f4f476d

save

0e58a7c

causal working on basic case

076f5fe

causal works!

1defaaf

alibi works!

9004132

clean up

4b795dd

clean prefill changes

ad6413c

micmelesse marked this pull request as ready for review August 2, 2024 00:08

micmelesse added 4 commits August 2, 2024 11:40

remove bwd stuff

6415d9a

limit decode test to test_op_fwd

485ba55

add ref

6b6e533

use bfloat

e081f43

micmelesse merged commit 01a1329 into main_perf Aug 6, 2024

micmelesse deleted the micmelesse/enable_kvcache branch August 6, 2024 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable flash_attn_with_kvcache #68

enable flash_attn_with_kvcache #68

Uh oh!

micmelesse commented Jul 10, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enable flash_attn_with_kvcache #68

enable flash_attn_with_kvcache #68

Uh oh!

Conversation

micmelesse commented Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

micmelesse commented Jul 10, 2024 •

edited

Loading